strange case
Dr Jekyll & Mr Hyde: the strange case of off-policy policy updates
The policy gradient theorem states that the policy should only be updated in states that are visited by the current policy, which leads to insufficient planning in the off-policy states, and thus to convergence to suboptimal policies. We tackle this planning issue by extending the policy gradient theory to policy updates with respect to any state density. Under these generalized policy updates, we show convergence to optimality under a necessary and sufficient condition on the updates' state densities, and thereby solve the aforementioned planning issue. We also prove asymptotic convergence rates that significantly improve those in the policy gradient literature. To implement the principles prescribed by our theory, we propose an agent, Dr Jekyll & Mr Hyde (J&H), with a double personality: Dr Jekyll purely exploits while Mr Hyde purely explores. J&H's independent policies allow to record two separate replay buffers: one on-policy (Dr Jekyll's) and one off-policy (Mr Hyde's), and therefore to update J&H's models with a mixture of on-policy and off-policy updates. More than an algorithm, J&H defines principles for actor-critic algorithms to satisfy the requirements we identify in our analysis. We extensively test on finite MDPs where J&H demonstrates a superior ability to recover from converging to a suboptimal policy without impairing its speed of convergence. We also implement a deep version of the algorithm and test it on a simple problem where it shows promising results.
Logical Judges Challenge Human Judges on the Strange Case of B.C.-Valjean
Mascardi, Viviana, Pellegrini, Domenico
The connections between logic programming and law have been studied for a long time. In 1975, Meldman discussed his PhD Thesis entitled "A preliminary study in computer-aided legal analysis" [12] where he modelled legal facts in a Lisp-like language and used instantiation (recalling unification) and syllogism (recalling resolution) to perform a simple kind of legal analysis inspired by Prosser's Law of Torts [13]. At that time Prolog was just born, but its applications to legal reasoning were not long in coming. One of the first attempts was made by Hustler [9] who implemented a prototype of a legal consultant in Prolog, again inspired by Prosser's work. A few years later, Kowalski, Sergot et al. succeeded in running a significant portion of the 1981 British Nationality Act, implemented in Prolog on a small micro computer [15]. In the same years, Prolog became very popular for implementing expert systems for the legal domain [3, 19]. From those early attempts, much progress has been made: research on deontic and defeasible reasoning [1, 5], ontological reasoning [7], and argumentation [8, 18] is extremely lively and helps disclosing the many connections between logic programming (and, more in general, computational logic and automated reasoning) and legal reasoning. The application of automated reasoning to digital forensics is another promising research direction [6] whose potential is witnessed by the ongoing "Digital Forensics: Evidence Analysis via Intelligent Systems and Practices" (DigForASP) COST Action
- Europe > Italy > Calabria (0.05)
- North America > United States > Wisconsin (0.04)
- North America > United States > Massachusetts (0.04)
- (3 more...)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.67)
- Information Technology > Security & Privacy (0.55)
- Law > Criminal Law (0.48)